A New Approach to Speeding Up Topic Modeling

نویسندگان

  • Jia Zeng
  • Zhi-Qiang Liu
  • Xiao-Qin Cao
چکیده

Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic space. To process massive corpora having a large number of topics, the training iteration of batch LDA algorithms is often inefficient and time-consuming. To accelerate the training speed, ABP actively scans the subset of corpus and searches the subset of topic space for topic modeling, therefore saves enormous training time in each iteration. To ensure accuracy, ABP selects only those documents and topics that contribute to the largest residuals within the residual belief propagation (RBP) framework. On four real-world corpora, ABP performs around 10 to 100 times faster than state-of-the-art batch LDA algorithms with a comparable topic modeling accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transmission switching cost modeling and determination candidate Lines for participation in joint energy and reserve markets

There is a great resolution calling for smart grids in recent years. Introduction of new technologies, that make the network flexible and controllable, is a main part of smart grid concept and a key factor to its success. Transmission network as a part of system network has drawn less attention. Transmission switching as a transmission service can release us from load shedding and remove the co...

متن کامل

A New Approach to the Study of Transverse Vibrations of a Rectangular Plate Having a Circular Central Hole

In this study, the analysis of transverse vibrations of rectangular plate with circular central hole with different boundary conditions is studied and the natural frequencies and natural modes of a rectangular plate with circular hole have been obtained. To solve the problem, it is necessary to use both Cartesian and polar coordinate system. The complexity of the method is to apply an appropria...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Reliability analysis of repairable systems using system dynamics modeling and simulation

Repairable standby system’s study and analysis is an important topic in reliability. Analytical techniques become very complicated and unrealistic especially for modern complex systems. There have been attempts in the literature to evolve more realistic techniques using simulation approach for reliability analysis of systems. This paper proposes a hybrid approach called as Markov system ...

متن کامل

UPFC Placement and Setting Optimized for Multi-objective Optimization Methods to Solve IPOPT in Pool Market

Abstract: Unified Power Flow Controller (UPFC) is one of the FACTS devices which plays a crucial role in simultaneous regulating active and reactive power, improving system load, reducing congestion and cost of production. Therefore, determining the optimum location of such equipment in order to improve the performance of the network is significant. In this paper, WCA algorithm is used to locat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1204.0170  شماره 

صفحات  -

تاریخ انتشار 2012